Overview
Dataset statistics
| Number of variables | 17 |
|---|---|
| Number of observations | 10757 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.7 MiB |
| Average record size in memory | 168.6 B |
Variable types
| DateTime | 2 |
|---|---|
| Numeric | 9 |
| Categorical | 5 |
| Unsupported | 1 |
mta_tax has constant value "0.5" | Constant |
improvement_surcharge has constant value "0.3" | Constant |
RatecodeID is highly overall correlated with extra and 3 other fields | High correlation |
duration_minutes is highly overall correlated with fare_amount and 2 other fields | High correlation |
extra is highly overall correlated with RatecodeID and 3 other fields | High correlation |
fare_amount is highly overall correlated with RatecodeID and 4 other fields | High correlation |
tip_amount is highly overall correlated with total_amount | High correlation |
total_amount is highly overall correlated with RatecodeID and 5 other fields | High correlation |
trip_distance is highly overall correlated with RatecodeID and 4 other fields | High correlation |
RatecodeID is highly imbalanced (94.9%) | Imbalance |
payment_type is highly imbalanced (52.6%) | Imbalance |
duration_minutes is highly skewed (γ1 = 20.27967956) | Skewed |
trip_duration is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
tip_amount has 3654 (34.0%) zeros | Zeros |
tolls_amount has 10368 (96.4%) zeros | Zeros |
Reproduction
| Analysis started | 2025-12-04 17:00:48.653365 |
|---|---|
| Analysis finished | 2025-12-04 17:00:53.602831 |
| Duration | 4.95 seconds |
| Software version | ydata-profiling vv4.18.0 |
| Download configuration | config.json |
Variables
tpep_pickup_datetime
Date
| Distinct | 10752 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 426.1 KiB |
| Minimum | 2017-01-01 00:08:25 |
|---|---|
| Maximum | 2017-12-31 23:45:30 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
| Distinct | 10752 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 426.1 KiB |
| Minimum | 2017-01-01 00:17:20 |
|---|---|
| Maximum | 2017-12-31 23:49:24 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
passenger_count
Real number (ℝ)
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.6362369 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 9 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 426.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.2572436 |
|---|---|
| Coefficient of variation (CV) | 0.76837506 |
| Kurtosis | 3.8242532 |
| Mean | 1.6362369 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.1835228 |
| Sum | 17601 |
| Variance | 1.5806615 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 7572 | |
| 2 | 1666 | 15.5% |
| 5 | 542 | 5.0% |
| 3 | 459 | 4.3% |
| 6 | 287 | 2.7% |
| 4 | 222 | 2.1% |
| 0 | 9 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 9 | 0.1% |
| 1 | 7572 | |
| 2 | 1666 | 15.5% |
| 3 | 459 | 4.3% |
| 4 | 222 | 2.1% |
| 5 | 542 | 5.0% |
| 6 | 287 | 2.7% |
| Value | Count | Frequency (%) |
| 6 | 287 | 2.7% |
| 5 | 542 | 5.0% |
| 4 | 222 | 2.1% |
| 3 | 459 | 4.3% |
| 2 | 1666 | 15.5% |
| 1 | 7572 | |
| 0 | 9 | 0.1% |
trip_distance
Real number (ℝ)
High correlation
| Distinct | 1124 |
|---|---|
| Distinct (%) | 10.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.8419513 |
| Minimum | 0 |
|---|---|
| Maximum | 30.83 |
| Zeros | 62 |
| Zeros (%) | 0.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 426.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.5 |
| Q1 | 1 |
| median | 1.74 |
| Q3 | 3.24 |
| 95-th percentile | 9.5 |
| Maximum | 30.83 |
| Range | 30.83 |
| Interquartile range (IQR) | 2.24 |
Descriptive statistics
| Standard deviation | 3.1753071 |
|---|---|
| Coefficient of variation (CV) | 1.1172982 |
| Kurtosis | 10.830603 |
| Mean | 2.8419513 |
| Median Absolute Deviation (MAD) | 0.89 |
| Skewness | 2.8922334 |
| Sum | 30570.87 |
| Variance | 10.082575 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 252 | 2.3% |
| 1.1 | 235 | 2.2% |
| 0.8 | 220 | 2.0% |
| 0.9 | 215 | 2.0% |
| 1.2 | 209 | 1.9% |
| 0.7 | 203 | 1.9% |
| 1.3 | 185 | 1.7% |
| 1.4 | 183 | 1.7% |
| 0.6 | 175 | 1.6% |
| 1.5 | 172 | 1.6% |
| Other values (1114) | 8708 |
| Value | Count | Frequency (%) |
| 0 | 62 | |
| 0.01 | 3 | < 0.1% |
| 0.02 | 4 | < 0.1% |
| 0.03 | 2 | < 0.1% |
| 0.04 | 2 | < 0.1% |
| 0.06 | 1 | < 0.1% |
| 0.07 | 2 | < 0.1% |
| 0.08 | 1 | < 0.1% |
| 0.1 | 16 | 0.1% |
| 0.11 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 30.83 | 1 | |
| 27.97 | 1 | |
| 27.88 | 1 | |
| 27.34 | 1 | |
| 26.54 | 1 | |
| 25.86 | 1 | |
| 25.8 | 1 | |
| 24.89 | 1 | |
| 24.61 | 1 | |
| 24.1 | 1 |
RatecodeID
Categorical
High correlation Imbalance
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 867.3 KiB |
| 1 | |
|---|---|
| 2 | 101 |
| 4 | 3 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 10653 | |
| 2 | 101 | 0.9% |
| 4 | 3 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 10653 | |
| 2 | 101 | 0.9% |
| 4 | 3 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 10653 | |
| 2 | 101 | 0.9% |
| 4 | 3 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 10757 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 10653 | |
| 2 | 101 | 0.9% |
| 4 | 3 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 10757 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 10653 | |
| 2 | 101 | 0.9% |
| 4 | 3 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 10757 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 10653 | |
| 2 | 101 | 0.9% |
| 4 | 3 | < 0.1% |
PULocationID
Real number (ℝ)
| Distinct | 125 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 160.56912 |
| Minimum | 4 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 426.1 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 48 |
| Q1 | 113 |
| median | 161 |
| Q3 | 231 |
| 95-th percentile | 249 |
| Maximum | 265 |
| Range | 261 |
| Interquartile range (IQR) | 118 |
Descriptive statistics
| Standard deviation | 66.02353 |
|---|---|
| Coefficient of variation (CV) | 0.41118449 |
| Kurtosis | -0.94669805 |
| Mean | 160.56912 |
| Median Absolute Deviation (MAD) | 68 |
| Skewness | -0.19541278 |
| Sum | 1727242 |
| Variance | 4359.1065 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 230 | 411 | 3.8% |
| 48 | 411 | 3.8% |
| 234 | 406 | 3.8% |
| 79 | 402 | 3.7% |
| 162 | 400 | 3.7% |
| 161 | 395 | 3.7% |
| 237 | 366 | 3.4% |
| 186 | 349 | 3.2% |
| 170 | 341 | 3.2% |
| 163 | 318 | 3.0% |
| Other values (115) | 6958 |
| Value | Count | Frequency (%) |
| 4 | 30 | 0.3% |
| 7 | 17 | 0.2% |
| 10 | 1 | < 0.1% |
| 12 | 2 | < 0.1% |
| 13 | 75 | |
| 14 | 2 | < 0.1% |
| 17 | 5 | < 0.1% |
| 24 | 22 | 0.2% |
| 25 | 16 | 0.1% |
| 28 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 2 | < 0.1% |
| 264 | 182 | |
| 263 | 166 | |
| 262 | 64 | 0.6% |
| 261 | 52 | 0.5% |
| 260 | 8 | 0.1% |
| 258 | 1 | < 0.1% |
| 256 | 9 | 0.1% |
| 255 | 23 | 0.2% |
| 249 | 311 |
DOLocationID
Real number (ℝ)
| Distinct | 194 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 158.22692 |
| Minimum | 4 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 426.1 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 41 |
| Q1 | 100 |
| median | 161 |
| Q3 | 233 |
| 95-th percentile | 261 |
| Maximum | 265 |
| Range | 261 |
| Interquartile range (IQR) | 133 |
Descriptive statistics
| Standard deviation | 72.613237 |
|---|---|
| Coefficient of variation (CV) | 0.45891834 |
| Kurtosis | -1.0866039 |
| Mean | 158.22692 |
| Median Absolute Deviation (MAD) | 70 |
| Skewness | -0.2512199 |
| Sum | 1702047 |
| Variance | 5272.6821 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 48 | 364 | 3.4% |
| 170 | 321 | 3.0% |
| 236 | 320 | 3.0% |
| 230 | 312 | 2.9% |
| 79 | 306 | 2.8% |
| 186 | 292 | 2.7% |
| 239 | 279 | 2.6% |
| 142 | 278 | 2.6% |
| 141 | 262 | 2.4% |
| 107 | 251 | 2.3% |
| Other values (184) | 7772 |
| Value | Count | Frequency (%) |
| 4 | 64 | |
| 7 | 62 | |
| 9 | 2 | < 0.1% |
| 10 | 5 | < 0.1% |
| 11 | 1 | < 0.1% |
| 12 | 6 | 0.1% |
| 13 | 86 | |
| 14 | 12 | 0.1% |
| 15 | 3 | < 0.1% |
| 16 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 12 | 0.1% |
| 264 | 166 | |
| 263 | 213 | |
| 262 | 144 | |
| 261 | 41 | 0.4% |
| 260 | 15 | 0.1% |
| 259 | 3 | < 0.1% |
| 257 | 16 | 0.1% |
| 256 | 42 | 0.4% |
| 255 | 66 | 0.6% |
payment_type
Categorical
Imbalance
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 867.3 KiB |
| 1 | |
|---|---|
| 2 | |
| 3 | 59 |
| 4 | 16 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 7407 | |
| 2 | 3275 | |
| 3 | 59 | 0.5% |
| 4 | 16 | 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 7407 | |
| 2 | 3275 | |
| 3 | 59 | 0.5% |
| 4 | 16 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 7407 | |
| 2 | 3275 | |
| 3 | 59 | 0.5% |
| 4 | 16 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 10757 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 7407 | |
| 2 | 3275 | |
| 3 | 59 | 0.5% |
| 4 | 16 | 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 10757 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 7407 | |
| 2 | 3275 | |
| 3 | 59 | 0.5% |
| 4 | 16 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 10757 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 7407 | |
| 2 | 3275 | |
| 3 | 59 | 0.5% |
| 4 | 16 | 0.1% |
fare_amount
Real number (ℝ)
High correlation
| Distinct | 128 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.296551 |
| Minimum | 2.5 |
|---|---|
| Maximum | 85.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 426.1 KiB |
Quantile statistics
| Minimum | 2.5 |
|---|---|
| 5-th percentile | 4.5 |
| Q1 | 6.5 |
| median | 9.5 |
| Q3 | 14.5 |
| 95-th percentile | 31 |
| Maximum | 85.5 |
| Range | 83 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 9.172736 |
|---|---|
| Coefficient of variation (CV) | 0.74596006 |
| Kurtosis | 7.1611999 |
| Mean | 12.296551 |
| Median Absolute Deviation (MAD) | 3.5 |
| Skewness | 2.3598243 |
| Sum | 132274 |
| Variance | 84.139085 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 555 | 5.2% |
| 6.5 | 514 | 4.8% |
| 5.5 | 508 | 4.7% |
| 7 | 506 | 4.7% |
| 7.5 | 487 | 4.5% |
| 5 | 470 | 4.4% |
| 8.5 | 456 | 4.2% |
| 8 | 436 | 4.1% |
| 9 | 435 | 4.0% |
| 9.5 | 411 | 3.8% |
| Other values (118) | 5979 |
| Value | Count | Frequency (%) |
| 2.5 | 67 | 0.6% |
| 3 | 47 | 0.4% |
| 3.5 | 147 | 1.4% |
| 4 | 268 | |
| 4.5 | 357 | |
| 5 | 470 | |
| 5.5 | 508 | |
| 6 | 555 | |
| 6.5 | 514 | |
| 7 | 506 |
| Value | Count | Frequency (%) |
| 85.5 | 1 | |
| 80 | 1 | |
| 78 | 1 | |
| 76 | 1 | |
| 73 | 1 | |
| 72.5 | 1 | |
| 70.5 | 1 | |
| 67.5 | 1 | |
| 66 | 2 | |
| 64.5 | 1 |
extra
Categorical
High correlation
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 888.3 KiB |
| 0.5 | |
|---|---|
| 1.0 | |
| 4.5 | 101 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.5 |
|---|---|
| 2nd row | 0.5 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.5 | 7095 | |
| 1.0 | 3561 | |
| 4.5 | 101 | 0.9% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.5 | 7095 | |
| 1.0 | 3561 | |
| 4.5 | 101 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 10757 | |
| 0 | 10656 | |
| 5 | 7196 | |
| 1 | 3561 | 11.0% |
| 4 | 101 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 32271 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| . | 10757 | |
| 0 | 10656 | |
| 5 | 7196 | |
| 1 | 3561 | 11.0% |
| 4 | 101 | 0.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 32271 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| . | 10757 | |
| 0 | 10656 | |
| 5 | 7196 | |
| 1 | 3561 | 11.0% |
| 4 | 101 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 32271 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| . | 10757 | |
| 0 | 10656 | |
| 5 | 7196 | |
| 1 | 3561 | 11.0% |
| 4 | 101 | 0.3% |
mta_tax
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 888.3 KiB |
| 0.5 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.5 |
|---|---|
| 2nd row | 0.5 |
| 3rd row | 0.5 |
| 4th row | 0.5 |
| 5th row | 0.5 |
Common Values
| Value | Count | Frequency (%) |
| 0.5 | 10757 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.5 | 10757 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 10757 | |
| . | 10757 | |
| 5 | 10757 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 32271 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10757 | |
| . | 10757 | |
| 5 | 10757 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 32271 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10757 | |
| . | 10757 | |
| 5 | 10757 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 32271 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10757 | |
| . | 10757 | |
| 5 | 10757 |
tip_amount
Real number (ℝ)
High correlation Zeros
| Distinct | 545 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.7989226 |
| Minimum | 0 |
|---|---|
| Maximum | 42.29 |
| Zeros | 3654 |
| Zeros (%) | 34.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 426.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1.46 |
| Q3 | 2.5 |
| 95-th percentile | 5.75 |
| Maximum | 42.29 |
| Range | 42.29 |
| Interquartile range (IQR) | 2.5 |
Descriptive statistics
| Standard deviation | 2.1840416 |
|---|---|
| Coefficient of variation (CV) | 1.2140832 |
| Kurtosis | 21.73657 |
| Mean | 1.7989226 |
| Median Absolute Deviation (MAD) | 1.46 |
| Skewness | 3.021124 |
| Sum | 19351.01 |
| Variance | 4.7700376 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3654 | |
| 1 | 708 | 6.6% |
| 2 | 379 | 3.5% |
| 1.5 | 164 | 1.5% |
| 1.66 | 115 | 1.1% |
| 3 | 115 | 1.1% |
| 1.96 | 105 | 1.0% |
| 2.06 | 104 | 1.0% |
| 1.46 | 102 | 0.9% |
| 1.45 | 102 | 0.9% |
| Other values (535) | 5209 |
| Value | Count | Frequency (%) |
| 0 | 3654 | |
| 0.01 | 6 | 0.1% |
| 0.02 | 2 | < 0.1% |
| 0.04 | 1 | < 0.1% |
| 0.08 | 1 | < 0.1% |
| 0.1 | 4 | < 0.1% |
| 0.12 | 1 | < 0.1% |
| 0.2 | 5 | < 0.1% |
| 0.26 | 2 | < 0.1% |
| 0.34 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 42.29 | 1 | |
| 28 | 1 | |
| 25 | 1 | |
| 22.22 | 1 | |
| 20 | 1 | |
| 18.92 | 1 | |
| 18.56 | 1 | |
| 17.19 | 2 | |
| 15.95 | 1 | |
| 15.76 | 1 |
tolls_amount
Real number (ℝ)
Zeros
| Distinct | 16 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.20604537 |
| Minimum | 0 |
|---|---|
| Maximum | 17.28 |
| Zeros | 10368 |
| Zeros (%) | 96.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 426.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 17.28 |
| Range | 17.28 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.0835814 |
|---|---|
| Coefficient of variation (CV) | 5.2589457 |
| Kurtosis | 32.996151 |
| Mean | 0.20604537 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.4578182 |
| Sum | 2216.43 |
| Variance | 1.1741486 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 10368 | |
| 5.76 | 273 | 2.5% |
| 5.54 | 94 | 0.9% |
| 2.64 | 6 | 0.1% |
| 2.54 | 5 | < 0.1% |
| 11.52 | 1 | < 0.1% |
| 2.16 | 1 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| 17.28 | 1 | < 0.1% |
| 5.49 | 1 | < 0.1% |
| Other values (6) | 6 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 10368 | |
| 2.16 | 1 | < 0.1% |
| 2.54 | 5 | < 0.1% |
| 2.64 | 6 | 0.1% |
| 2.7 | 1 | < 0.1% |
| 5.16 | 1 | < 0.1% |
| 5.49 | 1 | < 0.1% |
| 5.54 | 94 | 0.9% |
| 5.76 | 273 | 2.5% |
| 6.32 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 17.28 | 1 | < 0.1% |
| 16.62 | 1 | < 0.1% |
| 11.52 | 1 | < 0.1% |
| 10.5 | 1 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| 8.4 | 1 | < 0.1% |
| 6.32 | 1 | < 0.1% |
| 5.76 | 273 | |
| 5.54 | 94 | 0.9% |
| 5.49 | 1 | < 0.1% |
improvement_surcharge
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 888.3 KiB |
| 0.3 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.3 |
|---|---|
| 2nd row | 0.3 |
| 3rd row | 0.3 |
| 4th row | 0.3 |
| 5th row | 0.3 |
Common Values
| Value | Count | Frequency (%) |
| 0.3 | 10757 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.3 | 10757 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 10757 | |
| . | 10757 | |
| 3 | 10757 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 32271 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10757 | |
| . | 10757 | |
| 3 | 10757 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 32271 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10757 | |
| . | 10757 | |
| 3 | 10757 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 32271 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10757 | |
| . | 10757 | |
| 3 | 10757 |
total_amount
Real number (ℝ)
High correlation
| Distinct | 915 |
|---|---|
| Distinct (%) | 8.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.806325 |
| Minimum | 3.8 |
|---|---|
| Maximum | 111.38 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 426.1 KiB |
Quantile statistics
| Minimum | 3.8 |
|---|---|
| 5-th percentile | 6.3 |
| Q1 | 8.8 |
| median | 12.35 |
| Q3 | 17.88 |
| 95-th percentile | 39.36 |
| Maximum | 111.38 |
| Range | 107.58 |
| Interquartile range (IQR) | 9.08 |
Descriptive statistics
| Standard deviation | 11.304079 |
|---|---|
| Coefficient of variation (CV) | 0.71516174 |
| Kurtosis | 8.6127884 |
| Mean | 15.806325 |
| Median Absolute Deviation (MAD) | 4.05 |
| Skewness | 2.5823137 |
| Sum | 170028.64 |
| Variance | 127.7822 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7.3 | 257 | 2.4% |
| 8.3 | 235 | 2.2% |
| 7.8 | 235 | 2.2% |
| 6.8 | 230 | 2.1% |
| 8.8 | 229 | 2.1% |
| 10.3 | 223 | 2.1% |
| 10.8 | 210 | 2.0% |
| 9.3 | 206 | 1.9% |
| 9.8 | 199 | 1.8% |
| 6.3 | 182 | 1.7% |
| Other values (905) | 8551 |
| Value | Count | Frequency (%) |
| 3.8 | 38 | 0.4% |
| 4.3 | 38 | 0.4% |
| 4.56 | 1 | < 0.1% |
| 4.75 | 1 | < 0.1% |
| 4.8 | 61 | |
| 5 | 2 | < 0.1% |
| 5.15 | 2 | < 0.1% |
| 5.16 | 2 | < 0.1% |
| 5.28 | 2 | < 0.1% |
| 5.3 | 99 |
| Value | Count | Frequency (%) |
| 111.38 | 1 | |
| 99.59 | 1 | |
| 92.84 | 1 | |
| 91.9 | 1 | |
| 89.44 | 1 | |
| 89.16 | 1 | |
| 88.56 | 1 | |
| 86.76 | 1 | |
| 85.28 | 1 | |
| 85.06 | 1 |
trip_duration
Unsupported
Rejected Unsupported
| Missing | 0 |
|---|---|
| Missing (%) | 0.0% |
| Memory size | 426.1 KiB |
duration_minutes
Real number (ℝ)
High correlation Skewed
| Distinct | 2270 |
|---|---|
| Distinct (%) | 21.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.767031 |
| Minimum | -16.983333 |
|---|---|
| Maximum | 1439.55 |
| Zeros | 16 |
| Zeros (%) | 0.1% |
| Negative | 1 |
| Negative (%) | < 0.1% |
| Memory size | 426.1 KiB |
Quantile statistics
| Minimum | -16.983333 |
|---|---|
| 5-th percentile | 2.95 |
| Q1 | 6.5666667 |
| median | 10.916667 |
| Q3 | 17.516667 |
| 95-th percentile | 33.486667 |
| Maximum | 1439.55 |
| Range | 1456.5333 |
| Interquartile range (IQR) | 10.95 |
Descriptive statistics
| Standard deviation | 67.381783 |
|---|---|
| Coefficient of variation (CV) | 4.018707 |
| Kurtosis | 420.26994 |
| Mean | 16.767031 |
| Median Absolute Deviation (MAD) | 5.0333333 |
| Skewness | 20.27968 |
| Sum | 180362.95 |
| Variance | 4540.3047 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5.383333333 | 20 | 0.2% |
| 7.05 | 18 | 0.2% |
| 8.566666667 | 18 | 0.2% |
| 6.1 | 18 | 0.2% |
| 9.383333333 | 18 | 0.2% |
| 3.783333333 | 18 | 0.2% |
| 7.566666667 | 18 | 0.2% |
| 5.1 | 17 | 0.2% |
| 5.316666667 | 17 | 0.2% |
| 8.583333333 | 17 | 0.2% |
| Other values (2260) | 10578 |
| Value | Count | Frequency (%) |
| -16.98333333 | 1 | < 0.1% |
| 0 | 16 | |
| 0.03333333333 | 5 | < 0.1% |
| 0.05 | 7 | |
| 0.06666666667 | 1 | < 0.1% |
| 0.08333333333 | 4 | < 0.1% |
| 0.1 | 5 | < 0.1% |
| 0.1333333333 | 4 | < 0.1% |
| 0.15 | 3 | < 0.1% |
| 0.1666666667 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1439.55 | 1 | |
| 1439.15 | 1 | |
| 1438.65 | 1 | |
| 1438.55 | 1 | |
| 1438.466667 | 1 | |
| 1438.266667 | 1 | |
| 1437.833333 | 1 | |
| 1436.5 | 1 | |
| 1435.8 | 1 | |
| 1433.983333 | 1 |
Interactions
Correlations
| DOLocationID | PULocationID | RatecodeID | duration_minutes | extra | fare_amount | passenger_count | payment_type | tip_amount | tolls_amount | total_amount | trip_distance | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DOLocationID | 1.000 | 0.110 | 0.115 | -0.066 | 0.132 | -0.073 | 0.010 | 0.036 | -0.006 | -0.007 | -0.065 | -0.076 |
| PULocationID | 0.110 | 1.000 | 0.138 | -0.062 | 0.153 | -0.073 | -0.006 | 0.020 | -0.014 | -0.052 | -0.065 | -0.074 |
| RatecodeID | 0.115 | 0.138 | 1.000 | 0.000 | 0.707 | 0.757 | 0.000 | 0.000 | 0.259 | 0.278 | 0.599 | 0.619 |
| duration_minutes | -0.066 | -0.062 | 0.000 | 1.000 | 0.009 | 0.960 | 0.022 | 0.000 | 0.381 | 0.259 | 0.940 | 0.843 |
| extra | 0.132 | 0.153 | 0.707 | 0.009 | 1.000 | 0.587 | 0.016 | 0.000 | 0.259 | 0.276 | 0.538 | 0.505 |
| fare_amount | -0.073 | -0.073 | 0.757 | 0.960 | 0.587 | 1.000 | 0.022 | 0.039 | 0.400 | 0.299 | 0.978 | 0.935 |
| passenger_count | 0.010 | -0.006 | 0.000 | 0.022 | 0.016 | 0.022 | 1.000 | 0.026 | -0.023 | 0.016 | 0.015 | 0.030 |
| payment_type | 0.036 | 0.020 | 0.000 | 0.000 | 0.000 | 0.039 | 0.026 | 1.000 | 0.122 | 0.012 | 0.089 | 0.026 |
| tip_amount | -0.006 | -0.014 | 0.259 | 0.381 | 0.259 | 0.400 | -0.023 | 0.122 | 1.000 | 0.178 | 0.542 | 0.385 |
| tolls_amount | -0.007 | -0.052 | 0.278 | 0.259 | 0.276 | 0.299 | 0.016 | 0.012 | 0.178 | 1.000 | 0.311 | 0.295 |
| total_amount | -0.065 | -0.065 | 0.599 | 0.940 | 0.538 | 0.978 | 0.015 | 0.089 | 0.542 | 0.311 | 1.000 | 0.913 |
| trip_distance | -0.076 | -0.074 | 0.619 | 0.843 | 0.505 | 0.935 | 0.030 | 0.026 | 0.385 | 0.295 | 0.913 | 1.000 |
Missing values
Sample
| tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | trip_duration | duration_minutes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 2017-04-15 23:32:20 | 2017-04-15 23:49:03 | 1 | 4.37 | 1 | 4 | 112 | 2 | 16.5 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 17.80 | 0 days 00:16:43 | 16.716667 |
| 5 | 2017-03-25 20:34:11 | 2017-03-25 20:42:11 | 6 | 2.30 | 1 | 161 | 236 | 1 | 9.0 | 0.5 | 0.5 | 2.06 | 0.0 | 0.3 | 12.36 | 0 days 00:08:00 | 8.000000 |
| 6 | 2017-05-03 19:04:09 | 2017-05-03 20:03:47 | 1 | 12.83 | 1 | 79 | 241 | 1 | 47.5 | 1.0 | 0.5 | 9.86 | 0.0 | 0.3 | 59.16 | 0 days 00:59:38 | 59.633333 |
| 7 | 2017-08-15 17:41:06 | 2017-08-15 18:03:05 | 1 | 2.98 | 1 | 237 | 114 | 1 | 16.0 | 1.0 | 0.5 | 1.78 | 0.0 | 0.3 | 19.58 | 0 days 00:21:59 | 21.983333 |
| 12 | 2017-06-09 19:00:26 | 2017-06-09 19:20:11 | 1 | 3.00 | 1 | 13 | 148 | 1 | 15.0 | 1.0 | 0.5 | 3.35 | 0.0 | 0.3 | 20.15 | 0 days 00:19:45 | 19.750000 |
| 13 | 2017-11-06 23:35:05 | 2017-11-06 23:42:57 | 1 | 2.39 | 1 | 209 | 25 | 1 | 9.5 | 0.5 | 0.5 | 2.16 | 0.0 | 0.3 | 12.96 | 0 days 00:07:52 | 7.866667 |
| 16 | 2017-08-15 19:48:08 | 2017-08-15 20:00:37 | 1 | 3.60 | 1 | 163 | 41 | 1 | 12.5 | 1.0 | 0.5 | 2.85 | 0.0 | 0.3 | 17.15 | 0 days 00:12:29 | 12.483333 |
| 18 | 2017-04-10 18:12:58 | 2017-04-10 18:17:39 | 2 | 0.63 | 1 | 263 | 262 | 2 | 5.0 | 1.0 | 0.5 | 0.00 | 0.0 | 0.3 | 6.80 | 0 days 00:04:41 | 4.683333 |
| 19 | 2017-03-05 04:01:07 | 2017-03-05 04:14:11 | 2 | 2.77 | 1 | 79 | 68 | 1 | 11.5 | 0.5 | 0.5 | 3.20 | 0.0 | 0.3 | 16.00 | 0 days 00:13:04 | 13.066667 |
| 20 | 2017-12-30 23:52:44 | 2017-12-30 23:58:57 | 1 | 1.10 | 1 | 166 | 238 | 2 | 6.5 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 7.80 | 0 days 00:06:13 | 6.216667 |
| tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | trip_duration | duration_minutes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 22681 | 2017-06-09 18:24:49 | 2017-06-09 18:36:15 | 1 | 1.79 | 1 | 234 | 144 | 1 | 9.5 | 1.0 | 0.5 | 1.00 | 0.00 | 0.3 | 12.30 | 0 days 00:11:26 | 11.433333 |
| 22683 | 2017-08-03 17:30:04 | 2017-08-03 17:41:52 | 1 | 1.17 | 1 | 107 | 170 | 1 | 8.5 | 1.0 | 0.5 | 2.06 | 0.00 | 0.3 | 12.36 | 0 days 00:11:48 | 11.800000 |
| 22684 | 2017-08-03 16:36:32 | 2017-08-03 16:46:23 | 2 | 1.20 | 1 | 68 | 50 | 1 | 8.0 | 1.0 | 0.5 | 1.95 | 0.00 | 0.3 | 11.75 | 0 days 00:09:51 | 9.850000 |
| 22685 | 2017-07-05 22:42:46 | 2017-07-05 22:49:29 | 1 | 1.01 | 1 | 144 | 79 | 1 | 6.5 | 0.5 | 0.5 | 1.56 | 0.00 | 0.3 | 9.36 | 0 days 00:06:43 | 6.716667 |
| 22686 | 2017-02-08 18:13:26 | 2017-02-08 19:34:11 | 5 | 10.64 | 1 | 170 | 70 | 1 | 52.0 | 1.0 | 0.5 | 14.84 | 5.54 | 0.3 | 74.18 | 0 days 01:20:45 | 80.750000 |
| 22688 | 2017-08-05 21:23:29 | 2017-08-05 21:26:11 | 3 | 0.44 | 1 | 230 | 163 | 2 | 4.0 | 0.5 | 0.5 | 0.00 | 0.00 | 0.3 | 5.30 | 0 days 00:02:42 | 2.700000 |
| 22691 | 2017-01-06 01:50:14 | 2017-01-06 01:56:47 | 1 | 2.12 | 1 | 170 | 79 | 1 | 8.0 | 0.5 | 0.5 | 0.00 | 0.00 | 0.3 | 9.30 | 0 days 00:06:33 | 6.550000 |
| 22692 | 2017-07-16 03:22:51 | 2017-07-16 03:40:52 | 1 | 5.70 | 1 | 249 | 17 | 1 | 19.0 | 0.5 | 0.5 | 4.05 | 0.00 | 0.3 | 24.35 | 0 days 00:18:01 | 18.016667 |
| 22693 | 2017-08-10 22:20:04 | 2017-08-10 22:29:31 | 1 | 0.89 | 1 | 229 | 170 | 1 | 7.5 | 0.5 | 0.5 | 1.76 | 0.00 | 0.3 | 10.56 | 0 days 00:09:27 | 9.450000 |
| 22694 | 2017-02-24 17:37:23 | 2017-02-24 17:40:39 | 3 | 0.61 | 1 | 48 | 186 | 2 | 4.0 | 1.0 | 0.5 | 0.00 | 0.00 | 0.3 | 5.80 | 0 days 00:03:16 | 3.266667 |